iT邦幫忙

2025 iThome 鐵人賽

DAY 4
0
佛心分享-IT 人自學之術

學習 LLM系列 第 4

Day4 文本生成

  • 分享至 

  • xImage
  •  

生成英文文本 (GPT-2)
> 用 pipeline 生成
`from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

prompt = "Once upon a time,"
out = generator(
prompt,
max_new_tokens=60,
do_sample=True,
temperature=0.9,
top_k=50,
top_p=0.95,
num_return_sequences=2
)

for i, o in enumerate(out):
print("=== sequence", i+1, "===\n")
print(o["generated_text"])
print("\n")`

  • text-generation 是文本生成
  • model="gpt2" 是指定使用的模型
  • prompt : 模型會以此為起點繼續續寫
  • max_new_tokens 是指模型要另外產生多少 token(詞元)
  • do_sample : 是否用取樣來產生結果
    • True : 隨機抽樣
    • False: 採用貪婪或 beam search
  • temperature : 調整隨機性
    • 小(0.1〜0.5):比較保守、重複率低、結果更確定
    • 1.0:正常
    • 大(>1):更隨機、更發散
  • top_k=50 : 只取機率最高的前 k 個 token 中取樣(top-k sampling),限制極端低機率 token
  • top_p=0.95
    • Nucleus (top-p) sampling:選擇累積機率總和達到 p 的最小 token 集合再取樣。常跟 top_k 或
      temperature 組合使用
  • num_return_sequences=2:一次請模型回傳幾個不同候選(在 do_sample=True 時可看到多樣輸出)
    https://ithelp.ithome.com.tw/upload/images/20250918/20169173Kqv22oREeq.png

實驗

  1. 比較短 vs. 長的生成差異
    `print("=== max_new_tokens = 30 ===")
    out = generator(prompt, max_new_tokens=30, do_sample=True)
    print(out[0]["generated_text"], "\n")

print("=== max_new_tokens = 80 ===")
out = generator(prompt, max_new_tokens=80, do_sample=True)
print(out[0]["generated_text"], "\n")
`

  • 結果 :
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    === max_new_tokens = 30 ===
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    Once upon a time, the two races were separated by a large, open field of lava. Then the lava burst out of the ground, and the two races began to fight

=== max_new_tokens = 80 ===
Once upon a time, we were all part of the same group of individuals.

Now, all the data we have, all the data we now have about women's health, our sexual and reproductive health, and our health, all of that is being produced in a society that is essentially misogynistic, that is in a state of total denial: they're not just doing research, they're doing research. And you

  1. 改變 temperature(隨機性)
print("=== temperature = 0.2 (保守) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, temperature=0.2)
print(out[0]["generated_text"], "\n")

print("=== temperature = 1.2 (更隨機) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, temperature=1.2)
print(out[0]["generated_text"], "\n")
  • 結果 :
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    === temperature = 0.2 (保守) ===
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    Once upon a time, the world was a land of peace and prosperity, and the people were free to live their lives as they pleased. But now, the world is a land of terror and bloodshed, and the people are afraid of the people.

The people are

=== temperature = 1.2 (更隨機) ===
Once upon a time, there were a few people who did nothing but watch the screen play on TV, like if somebody who worked at your company gave you a job in a particular field or that you called up some friend, so all you had to do was just watch a

  • 越低(0.2)越安全、結果更像「背課文」
  • 越高(1.2)更有創意,但可能會亂
  1. 改變 top_k(只從前 k 高機率詞選)
print("=== top_k = 10 (限制更嚴格) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, top_k=10, temperature=0.9)
print(out[0]["generated_text"], "\n")

print("=== top_k = 100 (選擇更自由) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, top_k=100, temperature=0.9)
print(out[0]["generated_text"], "\n")
  • 結果 :
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    === top_k = 10 (限制更嚴格) ===
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    Once upon a time, the world was like this:

The sky was white and clear as if a sun was shining. The sun was shining, and it was all in the world.

A moment later the world was like this:

The world was

=== top_k = 100 (選擇更自由) ===
Once upon a time, the Lord Almighty summoned the heavens, the earth and all the things that exist, in order that His creatures and the Creator might be filled of His glorious knowledge."23 Now it is written, "You will know that God was the Lord Himself from before

  1. 改變 top_p(nucleus sampling)
print("=== top_p = 0.8 (保守,少數詞) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, top_p=0.8, temperature=0.9)
print(out[0]["generated_text"], "\n")

print("=== top_p = 0.95 (更開放) ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, top_p=0.95, temperature=0.9)
print(out[0]["generated_text"], "\n")
  • 結果 :
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    === top_p = 0.8 (保守,少數詞) ===
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    Once upon a time, the people of the world are looking at one another in awe, with the possibility of having a real understanding of their own experiences. The human spirit is at the center of that understanding, and the human mind is at the center of that understanding, too

=== top_p = 0.95 (更開放) ===
Once upon a time, even when he was still young, he would have dreamed of becoming a priest."

Then in his youth, he had a tendency toward the world of the priest. His imagination could conceive of the world of the priest. He even envisioned his own

  1. 一次產生多個結果
print("=== num_return_sequences = 3 ===")
out = generator(prompt, max_new_tokens=50, do_sample=True, temperature=0.9, num_return_sequences=3)
for i, o in enumerate(out):
    print(f"--- sequence {i+1} ---")
    print(o["generated_text"], "\n")
  • 結果 :
    Setting pad_token_id to eos_token_id:50256 for open-end generation.
    === num_return_sequences = 3 ===
    --- sequence 1 ---
    Once upon a time, the first thing that was heard in the room was a loud thump. Then was heard the sound of a heavy bell ringing through the room. The noise of the bell was followed by the sound of heavy footsteps against the walls, and the noise of

--- sequence 2 ---
Once upon a time, after I'd been writing about him I would not have been able to get up from my stool before we were set to go to bed. I would not have been able to sleep past 8 AM as I had been sitting there for hours. Every time

--- sequence 3 ---
Once upon a time, there were many things that were good about our company. We had great writers – writers that worked well within the company, and people that were extremely passionate with us.

With the help of my friends, a lot of folks found the company rewarding


上一篇
Day 3:第一個模型體驗
系列文
學習 LLM4
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言